Tentative Exploration on Reinforcement Learning Algorithms for Stochastic Rewards

نویسندگان

  • Luis Peña
  • Antonio LaTorre
  • José María Peña Sánchez
  • Sascha Ossowski
چکیده

This paper addresses a way to generate mixed strategies using reinforcement learning algorithms in domains with stochastic rewards. A new algorithm, based on Q-learning model, called TERSQ is introduced. As a difference from other approaches for stochastic scenarios, TERSQ uses a global exploration rate for all the state/actions in the same run. This exploration rate is selected at the beginning of each round, using a probabilistic distribution, which is updated once the run is finished. In this paper we compare TERSQ with similar approaches that use probability distributions depending on state-action pairs. Two experimental scenarios have been considered. First one deals with the problem of learning the optimal way to combine several evolutionary algorithms used simultaneously by a hybrid approach. In the second one, the objective is to learn the best strategy for a set of competing agents in combat-based videogame.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Exploration Policies with Models Conference on Automated Learning and Discovery (conald'98)

Reinforcement learning can greatly proot from world models updated by experience and used for computing policies. Fast discovery of near-optimal policies, however, requires to focus on \useful" experiences. Using an additional exploration model, we learn an exploration policy maximizing \exploration rewards" for visits of states that promise information gain. We augment this approach by an exte...

متن کامل

Learning exploration strategies in model-based reinforcement learning

Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called leo for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select explor...

متن کامل

Fifth International Conference on Simulation of Adaptive Behavior ( SAB

Model-Based Reinforcement Learning (MBRL) can greatly proot from using world models for estimating the consequences of selecting particular actions: an animat can construct such a model from its experiences and use it for computing rewarding behavior. We study the problem of collecting useful experiences through exploration in stochastic environments. Towards this end we use MBRL to maximize ex...

متن کامل

Reinforcement Learning mit adaptiver Steuerung von Exploration und Exploitation

Englisch) Using computational models of reinforcement learning (RL), intelligent behavior based on sensorimotor interactions can be learned (Sutton and Barto, 1998). The way of learning is inspired from neurobiology and psychology, where an artificial agent performs actions within its environment that responds with a reward signal describing the action’s utility. Therefore, the natural objectiv...

متن کامل

Scaling Up Reinforcement Learning through Targeted Exploration

Recent Reinforcement Learning (RL) algorithms, such as RMAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009